Chapter 6 LAGOS Water Quality Analysis
6.1 Loading in data
6.1.1 First download and then specifically grab the locus (or site lat longs)
#Lagos download script
#lagosne_get(dest_folder = LAGOSNE:::lagos_path(),overwrite=T)
#Load in lagos
lagos <- lagosne_load()
#Grab the lake centroid info
lake_centers <- lagos$locus
# Make an sf object
spatial_lakes <- st_as_sf(lake_centers,coords=c('nhd_long','nhd_lat'),
crs=4326)
#Grab the water quality data
nutr <- lagos$epi_nutr
#Look at column names
#names(nutr)6.1.2 Subset columns nutr to only keep key info that we want
clarity_only <- nutr %>%
select(lagoslakeid,sampledate,chla,doc,secchi) %>%
mutate(sampledate = as.character(sampledate) %>% ymd(.))6.1.3 Keep sites with at least 200 observations
#Look at the number of rows of dataset
#nrow(clarity_only)
chla_secchi <- clarity_only %>%
filter(!is.na(chla),
!is.na(secchi))
# How many observatiosn did we lose?
filteredObservations = nrow(clarity_only) - nrow(chla_secchi)
# Keep only the lakes with at least 200 observations of secchi and chla
chla_secchi_200 <- chla_secchi %>%
group_by(lagoslakeid) %>%
mutate(count = n()) %>%
filter(count > 200)We lost 651095 observations because they were missing Secchi or Chlorophyll data.
6.1.4 Join water quality data to spatial data
spatial_200 <- inner_join(spatial_lakes,chla_secchi_200 %>%
distinct(lagoslakeid,.keep_all=T),
by='lagoslakeid')6.1.5 Mean Chl_a map
### Take the mean chl_a and secchi by lake
mean_values_200 <- chla_secchi_200 %>%
# Take summary by lake id
group_by(lagoslakeid) %>%
# take mean chl_a per lake id
summarize(mean_chla = mean(chla,na.rm=T),
mean_secchi=mean(secchi,na.rm=T)) %>%
#Get rid of NAs
filter(!is.na(mean_chla),
!is.na(mean_secchi)) %>%
# Take the log base 10 of the mean_chl
mutate(log10_mean_chla = log10(mean_chla))
#Join datasets
mean_spatial <- inner_join(spatial_lakes,mean_values_200,
by='lagoslakeid')
#Make a map
mapview(mean_spatial,zcol='log10_mean_chla')6.2 Class work
6.2.1 1) What is the correlation between Secchi Disk Depth and Chlorophyll a for sites with at least 200 observations?
Here, I just want a plot of chla vs secchi for all sites
#Your code here
ggplot(mean_values_200) +
geom_point(aes( mean_secchi, mean_chla))
ggplot(chla_secchi) +
geom_point(aes( secchi, chla))
6.2.1.1 Why might this be the case?
Chlorophyll blocks light and obscures the secchi disk. As chlorophyll increases, Secchi depth decreases. Additionally algae are the primary producers of chlorophyll along with weeds. Lakes that have a higher nutrient load leads to more chlorophyll, these dissolved nutrients also tend to cloud waters and obscure secchi disks. Finally, deeper lakes tend to produce less photosynthetic algae due to increased mechanical mixing of different water layers.
6.2.2 2 What states have the most data?
6.2.2.1 2a First you will need to make a lagos spatial dataset that has the total number of counts per site.
## Get count for each lake
lago_summary = chla_secchi %>%
#slice(1:10000) %>%
group_by(lagoslakeid) %>%
summarize(
mean_chla = mean(chla,na.rm=T),
mean_secchi=mean(secchi,na.rm=T),
count=n()
)## Join to lake location
lago_location_summary =
merge(
x = lago_summary,
y = lake_centers,
by = "lagoslakeid",
all.x = TRUE
) %>%
st_as_sf(coords=c('nhd_long','nhd_lat'),crs=4326)mapview(lago_location_summary)